-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🧞 Fix genai exporter arguments bug and add Genai phi2 example #1061
Conversation
|
||
@staticmethod | ||
def is_accelerator_agnostic(accelerator_spec: AcceleratorSpec) -> bool: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, hardware dependent means need create InferenceSession and need run on target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, like Mike says, a search point can be invalid for a particular ep while the pass is accelerator agnostic.
Pass is only non-agnostic if the output of the pass changes based on the accelerator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, line 81 depends on the accelerator_spec so it satisfies the requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems, we need an extra pass config like target_ep
. Then the is_accelerator_agnostic
can be True.
Synced with Mike, the accelerator_spec
here is target device so current change is fine.
For genai exporter, the output will be different for different ep.
For cuda int4, the input shape is changed to fp16, but for cpu int4, it is still fp32.
Please make sure the search point is always Enum. If there are any possibility to use str for precision, we'd better add |
Describe your changes
With the latest onnxruntime-genai, there are a few bugs:
a.
precision
can only accept string value(int4, fp16, fp32) but not theb.
execution_provider
accept ep value but not device valuec. the pass of
GenAIModelExporter
is not EP agnostic. For cuda int4, the input shape is changed to fp16, but for cpu int4, it is still fp32.Add phi2 example.
Checklist before requesting a review
lintrunner -a
(Optional) Issue link